CMS Software Distribution on the LCG and OSG Grids 
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Abstract formed. They are listed in the following together with a 

short explanation and, if applicable, the solution adopted 
by CMS: 
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The efficient exploitation of worldwide distributed stor- 
age and computing resources available in the grids require a 
robust, transparent and fast deployment of experiment spe- 
cific software. The approach followed by the CMS experi- 
ment at CERN in order to enable Monte-Carlo simulations, 
data analysis and software development in an international 
collaboration is presented. The current status and future 
improvement plans are described. 

INTRODUCTION 

The CMS (Compact Muon Solenoid) experiment ||T| |2J 
of high energy physics is located in an underground cav- 
ern at the Large Hadron Collider (LHC) currently under 
construction at CERN, Geneva, Switzerland |^|. The in- 
ternational collaboration tackling the difficult task of con- 
structing and running the complex detector weighing 12.5 
tons comprises about 2000 scientists and engineers from 
160 institutions in 37 countries. With an expected record- 
ing rate of 150 events per second and event sizes of about 
1.5 MB the huge amount of 1500 TB of collected data per 
year has to be distributed on grids in order to be stored 
and analyzed. The efficient exploitation of the worldwide 
distributed data and the computing resources available in 
the grids require a robust, transparent and fast deployment 
of the experiment specific software that, especially in the 
start-up phase, will be rapidly developing. The approach 
followed by the CMS experiment in order to achieve this 
goal is presented, the current status of the implementations 
within the LHC Computing Grid (LCG) |4 5 1 and the Open 
Science Grid (OSG) ||5] [T] and future improvement plans 
are described. 

SOFTWARE PREPARATION 

Before the rapidly developing experiment software can 
be distributed a couple of preparatory steps have to be per- 



Release: The final content of a project has to be fixed. 
Within CMS the software projects are managed by 
SCRAM (Software Configuration, Release And Man- 
agement) lis], the collection of the latest updates is 
done via NICOS (Nightly Control System) ||9|. See 
also reference |10|. 

Packaging: All software components have to be 
packaged in archives suited for distribution on grids. 
CMS adopted the RPM (Red Hat Package Man- 
ager) 1 1 1 1 format, see also \^[2 \ . 

Testing: A test installation has to be performed fol- 
lowed up by a validation. 

Archiving: All produced packages have to be backed 
up, for CMS they are stored on tapes managed by 
CASTOR (CERN Advanced Storage Manager) (131 
M- 

Web/Grid Storage: The validated software archives 
have to be put into a repository accessible by web or 
grid tools. CMS employs a web server for this task, 
in addition the archived copies on CASTOR can be 
accessed via grid tools. 

Publication: New releases ready for distribution have 
to be announced. This is done using the same web 
server as for the repository. 

Mirroring: Ideally, to avoid overloading the primary 
repository, mirrors should be set up. 

SOFTWARE DISTRIBUTION 



Generic View 

Once the software has been prepared and the required 
services have been set up the actual distribution on the grids 



can start. In figureC]a generic view on the related services 
and their interconnections is presented. The software dis- 
tribution service, where the rectangular box provides some 
more details, consists of the four basic steps: Submission, 
installation, validation and publication. In comparison to 
local software installations additional grid services come 
into play. To avoid misuse or unintended destructive ac- 
tions the submission, which might be initiated in a man- 
aged or automated manner, has to be authorized for access 
to the software storage area. In addition, information re- 
trieved from a bookkeeping service can prevent unwanted 
(or double) submissions. The next two steps are similar 
to the point "Testing" of the software preparation section 
with the added complication that the actions have to be 
registered. In parallel to the distribution of new releases 
the current status of all participating grid sites is monitored 
so that changes in availability or e.g. deteriorated systems 
can be reported to an error treatment service and the entries 
in the bookkeeping are updated correspondingly. At the 
same time the monitoring can be used to trigger installa- 
tion submissions of new releases in an automated manner 
Any other problems occurring for example in the installa- 
tion or validation phases are reported to the error treatment 
as well. 



Implementation within LCG 

Within the LHC Computing Grid the whole chain of 
submission, installation, validation and publication is per- 
formed by the tool XCMSI 1 15','T6l, the publication occurs 
according to the GLUE (Grid Laboratory Uniform Envi- 
ronment) scheme | 4|ll7l . XCMSI also comprises a moni- 
toring service allowing automated submissions but no ded- 
icated bookkeeping apart from a simple web page, more 
details can be found in the section on monitoring and book- 
keeping. The submissions are authorized using X509 grid 
certificates of the mapping account cmssgm of the exper- 
iment software manager (ESM). The only means of error 
treatment currently implemented is the Savannah web page 
of the XCMSI project 1,1 8J. 



Implementation within OSG 

In the Open Science Grid the software installations 
are submitted from the CMS Software Deployment GUI 
(Graphical User Interface) by an experiment software man- 
ager again authorized by X509 grid certificates for the role 
of cmssoft. The installation is done with XCMSI like 
within LCG, to validate the software, however, a series of 
Monte-Carlo production jobs is run employing the Monte- 
Carlo Production Service (MCPS) Li9J. The result is pub- 
lished in the GLUE scheme as well as in the dedicated 
bookkeeping database CMSSoftDB [20]. A continuous 
monitoring has not been foreseen, the only available error 
treatment is again the XCMSI Savannah page. 



Comparison 

In general the experience from the Data Challenge 
04 1 21 1 has lead to rather similar setups of the software de- 
ployment on the LCG and OSG grids. Some components, 
especially the validation procedures and the escalation han- 
dling in case of problems, have to be further developed. 

MONITORING AND BOOKKEEPING 

As can be seen from figure ^ the task of software de- 
ployment is not considered to be completed after just one 
initial distribution of a new release. The current status has 
to be monitored continuously in order to update the pub- 
lished information on grid sites according to the actual sit- 
uation and to take preventive measures in case of problems. 
Within LCG the basic availability of a grid site and the 
correct functioning of the corresponding compute and stor- 
age elements is already monitored in the framework of the 
site functional tests (SET). In principal it is possible to add 
experiment specific tests on the installed software as well 
which would have the additional advantage that the jobs are 
run in privileged (express) queues. If a validation, however, 
requires that files within the experiment software area are 
modified or created it would be necessary that not only the 
experiment software manager but also the SETs run by the 
LCG integration team (dteam) are authorized to have write 
access. The same would be required if the monitoring is not 
only run in a passive testing mode but also in active mode 
that allows known problems to be fixed automatically or 
that even submits installation jobs for new releases. Since 
this is not very desirable a better solution should be found 
to prioritize software monitoring and installation jobs of the 
experiment software manager with respect to normal ones. 
This could be implemented for example in the framework 
of the Virtual Organization Membership Service (VOMS) 
where it is possible to attribute different authorizations and 
prioritizations to a grid certificate depending on the role the 
submitter assumes |4J. 

The aforementioned XCMSI project contains a moni- 
toring tool that is already run regularly (in unprivileged 
queues though) in order to run CMS specific tests e.g. to 
check the read/write permissions of the experiment soft- 
ware area, the CMS attributed architecture (operating sys- 
tem) of the compute element, the availability of the RPM 
database of installed packages and the accessibility of the 
published software projects. The collected data are pub- 
lished in the form of a web page presenting also some ad- 
ditional information on the outcome of the last test and a 
history file containing the past test results. 

A dedicated database to gather all the data in a 
nicely structured format is desirable. This would 
also improve the capabilities of the automated in- 
stallation prototype contained in the XCMSI moni- 
toring that currently relies on GLUE tag information 
like VD-cms-PRDJECT-request-install alone. Such 
a database has been implemented within the OSG: 
CMSSoftDB (20|. It is based on a MySQL (23 database 
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Figure 1 : A generic view showing the required services in order to prepare, distribute and maintain experiment specific 
software on a grid. Labelled arrows indicate their interconnections, the rectangular box provides some more details on the 
software distribution service. 



and provides i.a. a comprehensive overview of the CMS 
software installation status in the Open Science Grid. It is 
accessible via a web interface, the CMS Software Deploy- 
ment GUI 1231 1241 . that not only presents the collected in- 
formation in the database but also allows authorized users 
to perform actions like job submissions or other manage- 
ment tasks. The database is not coupled to a continuous 
monitoring though. 

OUTLOOK 

A lot of progress has been made compared to the situ- 
ation during the Data Challenge 04. CMS can deploy the 
experiment specific software, distribute and analyze data 
and monitor the software status on the grids. The inter- 
operability between LCG and OSG has been improved as 
well but some work remains to be done, especially a more 
consistent look onto the available information for users in 
the two grids should be achieved. Concerning monitoring 
and bookkeeping the efforts made within LCG and OSG 
could be nicely merged. Dedicated CPU/time slots for the 
ESM role are a must, but this can easily be done within the 
VOMS roles. The most important points remaining to be 
addressed are the error handling and escalation procedures 
as well as better validation suites. 
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